Imputation of missing single nucleotide polymorphism genotypes using a multivariate mixed model framework.
نویسندگان
چکیده
The objective of this paper was to investigate, for various scenarios at low and high marker density, the accuracy of imputing genotypes when using a multivariate mixed model framework using information from 2, 4, or 10 surrounding markers. This model predicts genotypes at a locus, using genotypes at nearby loci as correlated traits, and the additive genetic relationship matrix to use information from genotyped relatives. For 2 scenarios this method was compared with the population-based imputation algorithms FastPHASE and Beagle. Accuracies of imputation were obtained with Monte Carlo simulation and predicted with selection index theory, using input from the simulated data. Five different scenarios of missing genotypes were considered: 1) genotypes of some loci are missing due to genotyping errors, 2) juvenile selection candidates are genotyped using a smaller SNP panel, 3) some animals in the pedigree of a breeding population are not genotyped, 4) juvenile selection candidates are not genotyped, and 5) 1 generation of animals in the top of the pedigree are not genotyped. Surrounding marker information did not improve accuracy of imputation when animals whose genotypes were imputed were not genotyped for those surrounding markers. When those animals were genotyped for surrounding markers, results indicated a limited gain when linkage disequilibrium (LD) between SNP was low, but a substantial increase in accuracy when LD between SNP was high. For scenario 1, using 1 vs. 11 SNP, accuracy was respectively 0.75 and 0.81 at low, and 0.75 and 0.93 at high density. For scenario 2, using 1 vs. 11 SNP, accuracy was, respectively, 0.70 and 0.73 at low, and 0.71 and 0.84 at high density. Beagle outperformed the other methods at high SNP density, whereas the multivariate mixed model was clearly superior when SNP density was low and animals where genotyped with a reduced SNP panel. The results showed that extending the univariate gene content method to a multivariate BLUP model with inclusion of surrounding marker information only yields greater imputation accuracy when the animals with imputed loci are at least genotyped for some SNP that are in LD with the SNP to be imputed. The equation derived from selection index theory accurately predicted the accuracy of imputation using the multivariate mixed model framework.
منابع مشابه
Criteria of GenCall score to edit marker data and methods to handle missing markers have an influence on accuracy of genomic predictions
The aim of this study was to investigate the effect of different strategies for handling lowquality or missing data on prediction accuracy for direct genomic values of protein yield, mastitis and fertility using a Bayesian variable model and a GBLUP model in the Danish Jersey population. The data contained 1 071 Jersey bulls that were genotyped with the Illumina Bovine 50K chip. After prelimina...
متن کاملGenetic Diversity Analysis of Highly Incomplete SNP Genotype Data with Imputations: An Empirical Assessment
Genotyping by sequencing (GBS) recently has emerged as a promising genomic approach for assessing genetic diversity on a genome-wide scale. However, concerns are not lacking about the uniquely large unbalance in GBS genotype data. Although some genotype imputation has been proposed to infer missing observations, little is known about the reliability of a genetic diversity analysis of GBS data, ...
متن کاملStatistical Inference for Hardy-Weinberg Proportions in the Presence of Missing Genotype Information
In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for H...
متن کاملImputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites
Whole-genome sequencing technologies are being increasingly applied to Plasmodium falciparum clinical isolates to identify genetic determinants of malaria pathogenesis. However, genome-wide discovery methods, such as haplotype scans for signatures of natural selection, are hindered by missing genotypes in sequence data. Poor correlation between single nucleotide polymorphisms (SNPs) in the P. f...
متن کاملBias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation
Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred ex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of animal science
دوره 89 7 شماره
صفحات -
تاریخ انتشار 2011